Wednesday, October 20, 2004 2:02 PM Barschall 1-91 4-332-7719 p.03 


IN THE CLAIMS: 

1 1 . (currently amended) A video conferencing system comprising: 

2 a stationary image pickup device, remaining motionless during operation, for generating 

3 image signals representative of an image; 

4 on audio pickup device for generating audio signals representative of sound from an 
s audio source; and 

6 means for processing said image signals and said audio signals to determine a direction 

7 j of the audio source relative to a reference poin t the determina tion of direction depending at least 

8 at limes on the image signuls- 

1 2. (previously presented) The video conferencing system of chum 1 wherein said processing 

2 means comprises: 

3 an audio source localization system; 

4 a computer vision person detection system; and 

5 a multimodal speaker detection system. 


2 


3, (currently amended) The video conferencing system of claim 2 7 further comprising an 
integrated housing for an integrated video conferencing system incorporating the image pickup 
device, the audio pickup device, and the process i ng means ffiuHini odal int e gratio n archit e cture 
wyfetem. 

4. (original) The video conferencing system of claim 3, wherein the integrated housing is sized 
for being portable. 
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5. (previously presented) The video conferencing system of claim 1, further comprising an 
electronic pan tilt zoom system for electronically manipulating the image signals to effectively 
provide at least one of variable pan, lilt, and zoom functions. 

6. (previously presented) The video conferencing system of claim 1, wherein the image pickup 
device is a stationary camera that remains motionless during operation of the video conferencing 
system. 

7. (currcntiy amended) The video conferencing system of claim 1, wherein the processing means 
provides control signals to an electronic pan Lilt zoom system. 

i X. (previously presented) The video conferencing system of claim 2 7 wherein the audio source 

?. localization system detects the movement of the audio source when the audio source moves 

3 relative to the reference point, and, in response to the movement, the audio source It icalixat ion 

4 system causes a change in a field of view of the image pickup device. 

9. (previously presented ) The video conferencing system of claim 1, wherein the audio pickup 
device is comprised of an array oF two microphones. 

1 1 0. (currently amended) A method comprising the steps of: 

2 generating, at a stationary image pickup device, remaining motionless during operation, 

3 image signals representative of an image; 
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3 


7 


4 generating, at an audio pickup device, audio signals representative of sound Irom an 

5 audio source; 

6 processing the image signals and the audio signals to determine a direction of the audio 
source relative to a reference point, the determination of direction depending at least at times on 

* the image signals; 

9 manipulating the image signals to produce refined image signals depending on the 

10 determined direction; and 

n outputting said refined image signals. 

r 1 1 . (currently amended) The method of claim 1 0 further comprising the steps of: 
2 applying said audio signals to an audio source localization system; 

applying said image signals to a computer vision person detection system; 

4 processing said audio signals and said image signals with a multimodal speaker detection 

5 system to determine the direction of the audio source; 
generating control gigmilfl based on the determined direction of the audio source^-th* 

d e terminati o n dep e nding at l east at tim es on t he imag e signals ; 

8 applying the control signals to an electronic pan tilt zoom system to mimic the effect of at 

9 least one function of a movable camera^ said function selected from the group consisting 

10 panning, tilting, and zooming said movable camera; and 

1 1 providing an output from said electronic pan tilt zoom system. 

12. (previously presented) The method of claim 10, wherein manipulating the image signals 
includes varying a field of view of the image pickup device in response to the control signals. 
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13. (original) The method of claim 1 0 7 wherein processing the audio signals includes 
determining an audio hascd direction of the audio source based on the audio signals. 

1 14. (previously presented) The method of claim 1 0, wherein processing the audio signals 

2 includes detecting the movement of the audio source when Ihe audio source moves; and 

3 manipulating the imiige signals includes causing electronically, in response to the 
1 movement, a variation in a Held of view of the image pickup device. 

i 15. (previously presented) The method oPclaim 13, wherein processing the image signals 

i includes generating control signals depending on the audio based direction, and manipulating the 

3 image includes electronically panning, tilting, and/or zooming said image pickup device 

4 depending on the control signals. 

1 16- (previously presented) A video conferencing system comprising: 

2 microphones for generating audio signals representative of sound from a speaker; 

3 a stationary video camera, remaining motionless during operation, for generating video 
<\ signals representative of a video image; 

5 an electronic pan tilt zoom system for manipulating video images to produce the visual 
(\ effects of panning, tilting, and/or zooming; 

7 a processor for processing the video signals and the audio signals to determine a direction 

s of a speaker relative to a reference point and supplying control signals to the electronic pan tilt 

9 zoom system for producing images that include the speaker in the field of view of the camera, 
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to the determination of direction depending at feast at times on the video signals, ihc control sigunis 

1 1 being generated based on the determined direction of the speaker; and 

12 a transmitter for transmitting audio and video signals for video conferencing. 

1 7. (previously presented) The video conferencing system of claim 1, wherein at times the 
determination of the direction of the audio source depends on both the image signals and the 
audio signals. 

18. (previously presented) The video conferencing system of claim I, wherein the processing 
includes determining the movement of the audio source depending at least at times on the image 
signals. 

19. (previously presented) 'Hie video conferencing system of claim 1, wherein the processing 
includes tracking the position of the audio source when the audio source moves, the tracking 
depending M lca.sl at Limes on the image signals. 

1 20. (previously presented) The video conferencing system of claim 2, wherein the computer 

2 vision person detection system detects the movement of the audio source when the audio source 

3 moves relative to the reference point, and, in response to the movement, the computer vision 

4 person detection system causes a change in a field of view of the image pickup device. 

1 21 . (previously presented) The method of claim 1 0, wherein processing the image signals further 

2 includes: 
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3 detecting the movement of the audio source when die Wio source moves; and 

causing electronically, in response to ihe movement, an variation in a Held of view of the 
s image pickup device. 

22. (previously presented) The method of claim 10, wherein the processing includes determining 
the movement of the audio source depending at least at times on the image signals. 

21 (previously presented) The method of claim J 0, wherein the processing includes tracking the 
position of the audio source when the audio source moves, the tracking depending al least at 
times on the image signals. 

1 24. (currently amended) A video conferencing system, comprising: 

2 a stationary image pickup device, remaining motionless during operation, for generating 

3 image signals representative of an image; 

4 an audio pickup device f or generating audio signals representative of sound from an 

5 audio source; 

6 means for processing the image signals and the audio signals to determine a direction of 
the audio source relative to a reference poin^ the determination of direction depending at least at 

« times on the image signals; 

9 means for manipulating the image signals to produce refined image signals depending on 

10 the determined direction; and 

1 1 an output for oulputting said refined image signals. 
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25. (previously presented) The video conferencing system of claim ^ wherein the array of 
microphones includes only I wo microphones. 


9 
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